| Difference | Advantages | Limitations | |
|---|---|---|---|
| Custom Tree | Makes pruning decisions at each node during tree growth | Memory efficient | Greedy approach |
| Rpart | Grows the full tree first, then prunes back | Considers all pruning possibilities | Higher Memory usage |
Master of Business Analytics
Supervisor, Department of Econometrics and Business Statistics
Supervisor, Department of Econometrics and Business Statistics
20 October 2025
| Model | Core Idea | Strengths | Limitations |
|---|---|---|---|
| Decision Tree | Splits data using thresholds on features. | Interpretable; handles nonlinearity. | Prone to overfitting; unstable with small changes. |
| Support Vector Machine (SVM) | Finds optimal hyperplane maximizing margin. | Effective in high dimensions. | Sensitive to kernel choice; slow for large data. |
| Neural Network (MLP) | Learns hierarchical nonlinear relationships. | Handles complex patterns; flexible. | Opaque (“black box”); computationally heavy. |
| Difference | Advantages | Limitations | |
|---|---|---|---|
| Custom Tree | Makes pruning decisions at each node during tree growth | Memory efficient | Greedy approach |
| Rpart | Grows the full tree first, then prunes back | Considers all pruning possibilities | Higher Memory usage |
Petal.Length <= 4.85 (n = 80 )
Left:
Predict: versicolor (n = 41 )
Right:
Petal.Length <= 4.95 (n = 39 )
Left:
Sepal.Length <= 6.5 (n = 3 )
Left:
Predict: virginica (n = 2 )
Right:
Predict: versicolor (n = 1 )
Right:
Predict: virginica (n = 36 )
In-sample error: 5 % Out-sample error: 10 %
| Feature | Our SVMODT Approach | Literature Context | Practical Advantage |
|---|---|---|---|
| Tree Construction | Recursive linear SVM at each node. | Similar to Nie (2019) DTSVM. | Fast training; scalable to large datasets. |
| Split Criterion | SVM decision values determine splits. | Standard approach across all methods. | Mathematically principled; maximizes margin. |
| Scaling | Node-specific scaling at each split. | Novel contribution; not in literature. | Prevents feature scale issues; more robust. |
| Class Weights | Balanced or custom weights per node. | Similar to Bala & Agrawal (2010). | Handles imbalanced data effectively. |
| Feature Selection | Random/Mutual Info/Correlation with penalties. | Enhanced with penalties (novel). | Promotes feature diversity; reduces overfitting. |
| Hyperparameters | depth, min_samples, max_features. | Better than kernel methods. | Easy to tune; less prone to overfitting. |
🌳 Node: depth = 1 | n = 80 | features = [Petal.Length,Sepal.Length] | max_feat = 2
├─ Left branch (SVM > 0):
│ 🌳 Node: depth = 2 | n = 40 | features = [Petal.Length,Sepal.Length] | max_feat = 2
│ ├─ Left branch (SVM > 0):
│ │ 🍃 Leaf: predict = versicolor | n = 33
│ └─ Right branch (SVM ≤ 0):
│ 🍃 Leaf: predict = versicolor | n = 7
└─ Right branch (SVM ≤ 0):
🌳 Node: depth = 2 | n = 40 | features = [Petal.Length,Sepal.Length] | max_feat = 2
├─ Left branch (SVM > 0):
│ 🍃 Leaf: predict = virginica | n = 5
└─ Right branch (SVM ≤ 0):
🍃 Leaf: predict = virginica | n = 35
=== Tracing Prediction Path ===
Sample 4 :
Species = 1
Petal.Length = 4.4
Sepal.Length = 6.7
🌳 Node 1 : features = Petal.Length,Sepal.Length
SVM decision value: 2.7364
→ Going LEFT (decision > 0)
🌳 Node 2 : features = Petal.Length,Sepal.Length
SVM decision value: 2.921
→ Going LEFT (decision > 0)
🍃 FINAL: Predict versicolor (n = 33 )
Path taken: LEFT → LEFT
Final prediction: versicolor
[1] "versicolor"
Response: Diagnosis (M = malignant, B = benign)
Features:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension (“coastline approximation” - 1)
In-sample error: 3.08 %
Out-sample error: 2.61 %
🌳 Node: depth = 1 | n = 454 | features = [perimeter_worst,area_worst,radius_worst,concave.points_worst,concave.points_mean] | max_feat = 5 | penalty = ✓
├─ Left branch (SVM > 0):
│ 🌳 Node: depth = 2 | n = 273 | features = [radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean] | max_feat = 5 | penalty = ⚠️
│ ├─ Left branch (SVM > 0):
│ │ 🌳 Node: depth = 3 | n = 257 | features = [radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean] | max_feat = 5 | penalty = ⚠️
│ │ ├─ Left branch (SVM > 0):
│ │ │ 🍃 Leaf: predict = B | n = 222
│ │ └─ Right branch (SVM ≤ 0):
│ │ 🍃 Leaf: predict = B | n = 35
│ └─ Right branch (SVM ≤ 0):
│ 🌳 Node: depth = 3 | n = 16 | features = [area_se,fractal_dimension_worst,radius_mean,area_mean,texture_mean]
│ ├─ Left branch (SVM > 0):
│ │ 🍃 Leaf: predict = B | n = 14
│ └─ Right branch (SVM ≤ 0):
│ (no right child)
└─ Right branch (SVM ≤ 0):
🌳 Node: depth = 2 | n = 181 | features = [texture_mean,texture_worst,area_se,concavity_worst,radius_se] | max_feat = 5 | penalty = ⚠️
├─ Left branch (SVM > 0):
│ 🌳 Node: depth = 3 | n = 28 | features = [texture_worst,perimeter_worst,radius_mean,texture_mean,perimeter_mean] | max_feat = 5 | penalty = ⚠️
│ ├─ Left branch (SVM > 0):
│ │ 🍃 Leaf: predict = B | n = 16
│ └─ Right branch (SVM ≤ 0):
│ 🍃 Leaf: predict = M | n = 12
└─ Right branch (SVM ≤ 0):
🌳 Node: depth = 3 | n = 153 | features = [radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean] | max_feat = 5 | penalty = ⚠️
├─ Left branch (SVM > 0):
│ 🍃 Leaf: predict = M | n = 13
└─ Right branch (SVM ≤ 0):
🍃 Leaf: predict = M | n = 140
In-sample error: 3.3 %
Out-sample error: 2.61 %
🌳 Node: depth = 1 | n = 454 | features = [perimeter_worst,area_worst,radius_worst,concave.points_worst,concave.points_mean] | max_feat = 5 | penalty = ✓
├─ Left branch (SVM > 0):
│ 🌳 Node: depth = 2 | n = 273 | features = [radius_mean,texture_mean,perimeter_mean,area_mean] | max_feat = 4 | penalty = ✓
│ ├─ Left branch (SVM > 0):
│ │ 🌳 Node: depth = 3 | n = 251 | features = [radius_mean,texture_mean,perimeter_mean] | max_feat = 3 | penalty = ✓
│ │ ├─ Left branch (SVM > 0):
│ │ │ 🍃 Leaf: predict = B | n = 220
│ │ └─ Right branch (SVM ≤ 0):
│ │ 🍃 Leaf: predict = B | n = 31
│ └─ Right branch (SVM ≤ 0):
│ 🌳 Node: depth = 3 | n = 22 | features = [radius_mean,area_mean,area_se]
│ ├─ Left branch (SVM > 0):
│ │ 🍃 Leaf: predict = B | n = 20
│ └─ Right branch (SVM ≤ 0):
│ (no right child)
└─ Right branch (SVM ≤ 0):
🌳 Node: depth = 2 | n = 181 | features = [texture_mean,texture_worst,perimeter_worst,concave.points_worst] | max_feat = 4 | penalty = ✓
├─ Left branch (SVM > 0):
│ 🌳 Node: depth = 3 | n = 34 | features = [texture_worst,texture_mean,radius_mean] | max_feat = 3 | penalty = ✓
│ ├─ Left branch (SVM > 0):
│ │ 🍃 Leaf: predict = B | n = 17
│ └─ Right branch (SVM ≤ 0):
│ 🍃 Leaf: predict = M | n = 17
└─ Right branch (SVM ≤ 0):
🍃 Leaf: predict = M | n = 147
In-sample error: 2.86 %
Out-sample error: 0.87 %
🌳 Node: depth = 1 | n = 454 | features = [perimeter_worst,area_worst,radius_worst,concave.points_worst,concave.points_mean,perimeter_mean,area_mean,area_se,radius_mean] | max_feat = 9 | penalty = ✓
├─ Left branch (SVM > 0):
│ 🌳 Node: depth = 2 | n = 278 | features = [radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean] | max_feat = 7 | penalty = ⚠️
│ ├─ Left branch (SVM > 0):
│ │ 🌳 Node: depth = 3 | n = 220 | features = [radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave.points_mean,symmetry_mean] | max_feat = 9 | penalty = ⚠️
│ │ ├─ Left branch (SVM > 0):
│ │ │ 🍃 Leaf: predict = B | n = 200
│ │ └─ Right branch (SVM ≤ 0):
│ │ 🍃 Leaf: predict = B | n = 20
│ └─ Right branch (SVM ≤ 0):
│ 🌳 Node: depth = 3 | n = 58 | features = [radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave.points_mean] | max_feat = 8 | penalty = ⚠️
│ ├─ Left branch (SVM > 0):
│ │ 🍃 Leaf: predict = B | n = 43
│ └─ Right branch (SVM ≤ 0):
│ 🍃 Leaf: predict = B | n = 15
└─ Right branch (SVM ≤ 0):
🌳 Node: depth = 2 | n = 176 | features = [texture_mean,texture_worst,smoothness_worst,concavity_worst] | max_feat = 4 | penalty = ⚠️
├─ Left branch (SVM > 0):
│ 🌳 Node: depth = 3 | n = 33 | features = [radius_se,perimeter_worst,concavity_mean,radius_worst] | max_feat = 4 | penalty = ⚠️
│ ├─ Left branch (SVM > 0):
│ │ 🍃 Leaf: predict = B | n = 14
│ └─ Right branch (SVM ≤ 0):
│ 🍃 Leaf: predict = M | n = 19
└─ Right branch (SVM ≤ 0):
🌳 Node: depth = 3 | n = 143 | features = [radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave.points_mean,symmetry_mean] | max_feat = 9 | penalty = ⚠️
├─ Left branch (SVM > 0):
│ 🍃 Leaf: predict = M | n = 17
└─ Right branch (SVM ≤ 0):
🍃 Leaf: predict = M | n = 126
Deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.
Response: Death of a patient that is entering an ICU
Data Cleaning/Feature Engineering
Calculated Charlson Comorbidity Index
Counted additional diagnoses per admission
Kept only numeric predictors for model training
| Model | SVMODT | RBF SVM | Linear SVM | Decision Tree |
|---|---|---|---|---|
| AUC | 0.790 | 0.748 | 0.713 | 0.767 |
1. Hyperparameter Complexity
Multiple tuning parameters (max_depth, min_samples, max_features, penalty_weight)
Feature selection strategy requires careful consideration
2. Binary Classification Focus
3. Interpretability Trade-off
SVM decision boundaries less intuitive than pure thresholds
Feature interactions harder to explain
4. Computational Considerations
Node-specific scaling adds overhead
Feature penalty calculations at each split
Memory requirements for storing multiple SVM models
Automated hyperparameter tuning (Bayesian/meta-learning)
Native multi-class node splits
SHAP/LIME-based explainability
Parallel & approximate SVM scalability
Ensemble variants with random feature subsets
Thank You!
Questions?